30 research outputs found

    How many crowdsourced workers should a requester hire?

    Get PDF
    Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a number of workers to work on a set of similar tasks. After completing the tasks, each worker reports back outputs. The requester then aggregates the reported outputs to obtain aggregate outputs. A crucial question that arises during this process is: how many crowd workers should a requester hire? In this paper, we investigate from an empirical perspective the optimal number of workers a requester should hire when crowdsourcing tasks, with a particular focus on the crowdsourcing platform Amazon Mechanical Turk. Specifically, we report the results of three studies involving different tasks and payment schemes. We find that both the expected error in the aggregate outputs as well as the risk of a poor combination of workers decrease as the number of workers increases. Surprisingly, we find that the optimal number of workers a requester should hire for each task is around 10 to 11, no matter the underlying task and payment scheme. To derive such a result, we employ a principled analysis based on bootstrapping and segmented linear regression. Besides the above result, we also find that overall top-performing workers are more consistent across multiple tasks than other workers. Our results thus contribute to a better understanding of, and provide new insights into, how to design more effective crowdsourcing processes

    RuBQ: A Russian Dataset for Question Answering over Wikidata

    Full text link
    The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification. The freely available dataset will be of interest for a wide community of researchers and practitioners in the areas of Semantic Web, NLP, and IR, especially for those working on multilingual question answering. The proposed dataset generation pipeline proved to be efficient and can be employed in other data annotation projects. © 2020, Springer Nature Switzerland AG.We thank Mikhail Galkin, Svitlana Vakulenko, Daniil Sorokin, Vladimir Kovalenko, Yaroslav Golubev, and Rishiraj Saha Roy for their valuable comments and fruitful discussion on the paper draft. We also thank Pavel Bakhvalov, who helped collect RuWikidata8M sample and contributed to the first version of the entity linking tool. We are grateful to Yandex.Toloka for their data annotation grant. PB acknowledges support by Ural Mathematical Center under agreement No. 075-02-2020-1537/1 with the Ministry of Science and Higher Education of the Russian Federation

    Cooperation and Contagion in Web-Based, Networked Public Goods Experiments

    Get PDF
    A longstanding idea in the literature on human cooperation is that cooperation should be reinforced when conditional cooperators are more likely to interact. In the context of social networks, this idea implies that cooperation should fare better in highly clustered networks such as cliques than in networks with low clustering such as random networks. To test this hypothesis, we conducted a series of web-based experiments, in which 24 individuals played a local public goods game arranged on one of five network topologies that varied between disconnected cliques and a random regular graph. In contrast with previous theoretical work, we found that network topology had no significant effect on average contributions. This result implies either that individuals are not conditional cooperators, or else that cooperation does not benefit from positive reinforcement between connected neighbors. We then tested both of these possibilities in two subsequent series of experiments in which artificial seed players were introduced, making either full or zero contributions. First, we found that although players did generally behave like conditional cooperators, they were as likely to decrease their contributions in response to low contributing neighbors as they were to increase their contributions in response to high contributing neighbors. Second, we found that positive effects of cooperation were contagious only to direct neighbors in the network. In total we report on 113 human subjects experiments, highlighting the speed, flexibility, and cost-effectiveness of web-based experiments over those conducted in physical labs

    Crowdcloud: A Crowdsourced System for Cloud Infrastructure

    Get PDF
    The widespread adoption of truly portable, smart devices and Do-It-Yourself computing platforms by the general public has enabled the rise of new network and system paradigms. This abundance of wellconnected, well-equipped, affordable devices, when combined with crowdsourcing methods, enables the development of systems with the aid of the crowd. In this work, we introduce the paradigm of Crowdsourced Systems, systems whose constituent infrastructure, or a significant part of it, is pooled from the general public by following crowdsourcing methodologies. We discuss the particular distinctive characteristics they carry and also provide their “canonical” architecture. We exemplify the paradigm by also introducing Crowdcloud, a crowdsourced cloud infrastructure where crowd members can act both as cloud service providers and cloud service clients. We discuss its characteristic properties and also provide its functional architecture. The concepts introduced in this work underpin recent advances in the areas of mobile edge/fog computing and co-designed/cocreated systems

    Conducting interactive experiments online

    Get PDF
    Online labor markets provide new opportunities for behavioral research, but conducting economic experiments online raises important methodological challenges. This particularly holds for interactive designs. In this paper, we provide a methodological discussion of the similarities and differences between interactive experiments conducted in the laboratory and online. To this end, we conduct a repeated public goods experiment with and without punishment using samples from the laboratory and the online platform Amazon Mechanical Turk. We chose to replicate this experiment because it is long and logistically complex. It therefore provides a good case study for discussing the methodological and practical challenges of online interactive experimentation. We find that basic behavioral patterns of cooperation and punishment in the laboratory are replicable online. The most important challenge of online interactive experiments is participant dropout. We discuss measures for reducing dropout and show that, for our case study, dropouts are exogenous to the experiment. We conclude that data quality for interactive experiments via the Internet is adequate and reliable, making online interactive experimentation a potentially valuable complement to laboratory studies

    Are all ‘research fields’ equal? Rethinking practice for the use of data from crowd-sourcing market places

    Get PDF
    New technologies like large-scale social media sides (e.g., Facebook and Twitter) and crowdsourcing services (e.g., Amazon Mechanical Turk, Crowdflower, Clickworker) impact social science research and provide many new and interesting avenues for research. The use of these new technologies for research has not been without challenges and a recently published psychological study on Facebook led to a widespread discussion on the ethics of conducting large-scale experiments online. Surprisingly little has been said about the ethics of conducting research using commercial crowdsourcing market places. In this paper, I want to focus on the question of which ethical questions are raised by data collection with crowdsourcing tools. I briefly draw on implications of internet research more generally and then focus on the specific challenges that research with crowdsourcing tools faces. I identify fair-pay and related issues of respect for autonomy as well as problems with power dynamics between researcher and participant, which has implications for ‘withdrawal-withoutprejudice’, as the major ethical challenges with crowdsourced data. Further, I will to draw attention on how we can develop a ‘best practice’ for researchers using crowdsourcing tools

    ACRyLIQ: Leveraging DBpedia for adaptive crowdsourcing in linked data quality assessment

    Get PDF
    Crowdsourcing has emerged as a powerful paradigm for quality assessment and improvement of Linked Data. A major challenge of employing crowdsourcing, for quality assessment in Linked Data, is the cold-start problem: how to estimate the reliability of crowd workers and assign the most reliable workers to tasks? We address this challenge by proposing a novel approach for generating test questions from DBpedia based on the topics associated with quality assessment tasks. These test questions are used to estimate the reliability of the new workers. Subsequently, the tasks are dynamically assigned to reliable workers to help improve the accuracy of collected responses. Our proposed approach, ACRyLIQ, is evaluated using workers hired from Amazon Mechanical Turk, on two real-world Linked Data datasets. We validate the proposed approach in terms of accuracy and compare it against the baseline approach of reliability estimate using gold-standard task. The results demonstrate tha t our proposed approach achieves high accuracy without using gold-standard task
    corecore